191 research outputs found

    Asymptotic properties of the sequential empirical ROC, PPV and NPV curves under case-control sampling

    Full text link
    The receiver operating characteristic (ROC) curve, the positive predictive value (PPV) curve and the negative predictive value (NPV) curve are three measures of performance for a continuous diagnostic biomarker. The ROC, PPV and NPV curves are often estimated empirically to avoid assumptions about the distributional form of the biomarkers. Recently, there has been a push to incorporate group sequential methods into the design of diagnostic biomarker studies. A thorough understanding of the asymptotic properties of the sequential empirical ROC, PPV and NPV curves will provide more flexibility when designing group sequential diagnostic biomarker studies. In this paper, we derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves under case-control sampling using sequential empirical process theory. We show that the sequential empirical ROC, PPV and NPV curves converge to the sum of independent Kiefer processes and show how these results can be used to derive asymptotic results for summaries of the sequential empirical ROC, PPV and NPV curves.Comment: Published in at http://dx.doi.org/10.1214/11-AOS937 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Structured penalties for functional linear models---partially empirical eigenvectors for regression

    Get PDF
    One of the challenges with functional data is incorporating spatial structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the ill-posed problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates spatial structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are `partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The GSVD clarifies the penalized estimation process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance, and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.Comment: 29 pages, 3 figures, 5 tables; typo/notational errors edited and intro revised per journal review proces

    Asymptotic Properties of the Sequential Empirical ROC and PPV Curves

    Get PDF
    The receiver operating characteristic (ROC) curve, the positive predictive value (PPV) curve and the negative predictive value (NPV) curve are three common measures of performance for a diagnostic biomarker. The independent increments covariance structure assumption is common in the group sequential study design literature. Showing that summary measures of the ROC, PPV and NPV curves have an independent increments covariance structure will provide the theoretical foundation for designing group sequential diagnostic biomarker studies. The ROC, PPV and NPV curves are often estimated empirically to avoid assumptions about the distributional form of the biomarkers. In this paper we derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves. These results are used to show that the independent increments assumption holds for some summary measures of the ROC, PPV and NPV curves when estimated empirically

    Pooling Community Data for Community Interventions When the Number of Pairs is Small

    Get PDF
    There is considerable interest in community interventions for health promotion, where the community is the experimental unit. Because such interventions are expensive, the number of experimental units (communities) is usually very small, yielding a study with low power. We examined the ability of a process known as ā€œpoolingā€ or ā€œpreliminary significance testingā€ to improve the power of community variations. In this process, one first tests whether there is significant community variation, using type 1 error of perhaps 0.25. If there is significant variation, the usual community-level test is performed. If not, a person-level test is performed. We found through Monte Carlo simulation that for studies with 2, 3, or 4 communities per group, this procedure could improve power somewhat in situations where the community by time variation is known to be small. Estimates of community by time variation for a variety of health variables are also presented. Because of the limited information available on community variances, and the probable difficulties in defending a person-level analysis, we recommend against the pooling procedure at this time

    A new family-based association test via a least-squares method

    Get PDF
    To test the association between a dichotomous phenotype and genetic marker based on family data, we propose a least-squares method using the vector of phenotypes and their cross products within each family. This new approach allows covariate adjustment and is numerically much simpler to implement compared to likelihood- based methods. The new approach is asymptotically equivalent to the generalized estimating equation approach with a diagonal working covariance matrix, thus avoiding some difficulties with the working covariance matrix reported previously in the literature. When applied to the data from Collaborative Study on the Genetics of Alcoholism, this new method shows a significant association between the marker rs1037475 and alcoholism

    Reliability, Effect Size, and Responsiveness and Intraclass Correlation of Health Status Measures Used in Randomized and Cluster-Randomized Trials

    Get PDF
    Background: New health status instruments are described by psychometric properties, such as Reliability, Effect Size, and Responsiveness. For cluster-randomized trials, another important statistic is the Intraclass Correlation for the instrument within clusters. Studies using better instruments can be performed with smaller sample sizes, but better instruments may be more expensive in terms of dollars, lost opportunities, or poorer data quality due to the response burden of longer instruments. Investigators often need to estimate the psychometric properties of a new instrument, or of an established instrument in a new setting. Optimal sample sizes for estimating these properties have not been studied in detail. Methods: We examined the power of a two-sample test as a function of the Reliability, Effect Size, Responsiveness, and Intraclass Correlation of the instrument. We calculated the ā€œcost-effectivenessā€ of using a 1-item versus a 5-item measure of mental health status. We also used simulation to determine formulas for the sample size needed to estimate the psychometric statistics accurately. Findings: Under the usual model for measurement error, the psychometric statistics are all functions of the same error term. In randomized trials, a poorer instrument can achieve the desired power if the number of persons per treatment group is increased. In cluster-randomized trials, adequate power may be obtained by increasing the number of clusters per treatment group (and often the number of persons per cluster), as well as by choosing a better instrument. The 1-item measure of mental health status may be more cost-effective than the 5-item measure in some settings. Most published psychometric values are situation-specific. Very large samples are required to estimate Responsiveness and the Intraclass Correlation accurately. Conclusion: If the goal is to diagnose or refer individual patients, an instrument with high Validity and Reliability is needed. In settings where the sample sizes can be increased easily, less reliable instruments may be cost-effective. It is likely that many values of published psychometric statistics were derived from samples too small to provide accurate values, or are importantly specific to the setting in which they were derived. Note: A paper based on some of the material in this technical report has been published. (Diehr P, Chen L, Patrick D, Feng Z, Yasui Y. Reliability, effect size, and responsiveness of health status measures in the design of randomized and cluster-randomized trials. Contemporary Clinical Trials. 2005; 26:45-58. B). That paper does not include the material on estimating the sample size required to provide an accurate estimate of the reliability of a new instrument. That material is included in this technical report

    Modelā€free scoring system for risk prediction with application to hepatocellular carcinoma study

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/142930/1/biom12750_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142930/2/biom12750-sup-0001-SuppData-S1.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/142930/3/biom12750.pd

    An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers

    Get PDF
    Discovery of ā€œsignatureā€ protein profiles that distinguish disease states (eg, malignant, benign, and normal) is a key step towards translating recent advancements in proteomic technologies into clinical utilities. Protein data generated from mass spectrometers are, however, large in size and have complex features due to complexities in both biological specimens and interfering biochemical/physical processes of the measurement procedure. Making sense out of such high-dimensional complex data is challenging and necessitates the use of a systematic data analytic strategy. We propose here a data processing strategy for two major issues in the analysis of such mass-spectrometry-generated proteomic data: (1) separation of protein ā€œsignalsā€ from background ā€œnoiseā€ in protein intensity measurements and (2) calibration of protein mass/charge measurements across samples. We illustrate the two issues and the utility of the proposed strategy using data from a prostate cancer biomarker discovery project as an example

    DMseg: a Python algorithm for de novo detection of differentially or variably methylated regions

    Full text link
    Detecting and assessing statistical significance of differentially methylated regions (DMRs) is a fundamental task in methylome association studies. While the average differential methylation in different phenotype groups has been the inferential focus, methylation changes in chromosomal regions may also present as differential variability, i.e., variably methylated regions (VMRs). Testing statistical significance of regional differential methylation is a challenging problem, and existing algorithms do not provide accurate type I error control for genome-wide DMR or VMR analysis. No algorithm has been publicly available for detecting VMRs. We propose DMseg, a Python algorithm with efficient DMR/VMR detection and significance assessment for array-based methylome data, and compare its performance to Bumphunter, a popular existing algorithm. Operationally, DMseg searches for DMRs or VMRs within CpG clusters that are adaptively determined by both gap distance and correlation between contiguous CpG sites in a microarray. Levene test was implemented for assessing differential variability of individual CpGs. A likelihood ratio statistic is proposed to test for a constant difference within CpGs in a DMR or VMR to summarize the evidence of regional difference. Using a stratified permutation scheme and pooling null distributions of LRTs from clusters with similar numbers of CpGs, DMseg provides accurate control of the type I error rate. In simulation experiments, DMseg shows superior power than Bumphunter to detect DMRs. Application to methylome data of Barrett's esophagus and esophageal adenocarcinoma reveals a number of DMRs and VMRs of biological interest
    • ā€¦
    corecore